27 research outputs found

    On the distribution of time-to-proof of mathematical conjectures

    Get PDF
    What is the productivity of Science? Can we measure an evolution of the production of mathematicians over history? Can we predict the waiting time till the proof of a challenging conjecture such as the P-versus-NP problem? Motivated by these questions, we revisit a suggestion published recently and debated in the "New Scientist" that the historical distribution of time-to-proof's, i.e., of waiting times between formulation of a mathematical conjecture and its proof, can be quantified and gives meaningful insights in the future development of still open conjectures. We find however evidence that the mathematical process of creation is too much non-stationary, with too little data and constraints, to allow for a meaningful conclusion. In particular, the approximate unsteady exponential growth of human population, and arguably that of mathematicians, essentially hides the true distribution. Another issue is the incompleteness of the dataset available. In conclusion we cannot really reject the simplest model of an exponential rate of conjecture proof with a rate of 0.01/year for the dataset that we have studied, translating into an average waiting time to proof of 100 years. We hope that the presented methodology, combining the mathematics of recurrent processes, linking proved and still open conjectures, with different empirical constraints, will be useful for other similar investigations probing the productivity associated with mankind growth and creativity.Comment: 10 pages + 6 figure

    High quality topic extraction from business news explains abnormal financial market volatility

    Get PDF
    Understanding the mutual relationships between information flows and social activity in society today is one of the cornerstones of the social sciences. In financial economics, the key issue in this regard is understanding and quantifying how news of all possible types (geopolitical, environmental, social, financial, economic, etc.) affect trading and the pricing of firms in organized stock markets. In this article, we seek to address this issue by performing an analysis of more than 24 million news records provided by Thompson Reuters and of their relationship with trading activity for 206 major stocks in the S&P US stock index. We show that the whole landscape of news that affect stock price movements can be automatically summarized via simple regularized regressions between trading activity and news information pieces decomposed, with the help of simple topic modeling techniques, into their "thematic" features. Using these methods, we are able to estimate and quantify the impacts of news on trading. We introduce network-based visualization techniques to represent the whole landscape of news information associated with a basket of stocks. The examination of the words that are representative of the topic distributions confirms that our method is able to extract the significant pieces of information influencing the stock market. Our results show that one of the most puzzling stylized fact in financial economies, namely that at certain times trading volumes appear to be "abnormally large," can be partially explained by the flow of news. In this sense, our results prove that there is no "excess trading," when restricting to times when news are genuinely novel and provide relevant financial information.Comment: The previous version of this article included an error. This is a revised versio

    Prediction of ESG Compliance using a Heterogeneous Information Network

    Full text link
    Negative screening is one method to avoid interactions with inappropriate entities. For example, financial institutions keep investment exclusion lists of inappropriate firms that have environmental, social, and government (ESG) problems. They create their investment exclusion lists by gathering information from various news sources to keep their portfolios profitable as well as green. International organizations also maintain smart sanctions lists that are used to prohibit trade with entities that are involved in illegal activities. In the present paper, we focus on the prediction of investment exclusion lists in the finance domain. We construct a vast heterogeneous information network that covers the necessary information surrounding each firm, which is assembled using seven professionally curated datasets and two open datasets, which results in approximately 50 million nodes and 400 million edges in total. Exploiting these vast datasets and motivated by how professional investigators and journalists undertake their daily investigations, we propose a model that can learn to predict firms that are more likely to be added to an investment exclusion list in the near future. Our approach is tested using the negative news investment exclusion list data of more than 35,000 firms worldwide from January 2012 to May 2018. Comparing with the state-of-the-art methods with and without using the network, we show that the predictive accuracy is substantially improved when using the vast information stored in the heterogeneous information network. This work suggests new ways to consolidate the diffuse information contained in big data to monitor dominant firms on a global scale for better risk management and more socially responsible investment

    Predicted and Verified Deviations from Zipf's law in Ecology of Competing Products

    Full text link
    Zipf's power-law distribution is a generic empirical statistical regularity found in many complex systems. However, rather than universality with a single power-law exponent (equal to 1 for Zipf's law), there are many reported deviations that remain unexplained. A recently developed theory finds that the interplay between (i) one of the most universal ingredients, namely stochastic proportional growth, and (ii) birth and death processes, leads to a generic power-law distribution with an exponent that depends on the characteristics of each ingredient. Here, we report the first complete empirical test of the theory and its application, based on the empirical analysis of the dynamics of market shares in the product market. We estimate directly the average growth rate of market shares and its standard deviation, the birth rates and the "death" (hazard) rate of products. We find that temporal variations and product differences of the observed power-law exponents can be fully captured by the theory with no adjustable parameters. Our results can be generalized to many systems for which the statistical properties revealed by power law exponents are directly linked to the underlying generating mechanism

    Sales Distribution of Consumer Electronics

    Full text link
    Using the uniform most powerful unbiased test, we observed the sales distribution of consumer electronics in Japan on a daily basis and report that it follows both a lognormal distribution and a power-law distribution and depends on the state of the market. We show that these switches occur quite often. The underlying sales dynamics found between both periods nicely matched a multiplicative process. However, even though the multiplicative term in the process displays a size-dependent relationship when a steady lognormal distribution holds, it shows a size-independent relationship when the power-law distribution holds. This difference in the underlying dynamics is responsible for the difference in the two observed distributions

    Predicting Adverse Media Risk using a Heterogeneous Information Network

    No full text
    The media plays a central role in monitoring powerful institutions and identifying any activities harmful to the public interest. In the investing sphere constituted of 46,583 officially listed domestic firms on the stock exchanges worldwide, there is a growing interest “to do the right thing”, i.e., to put pressure on companies to improve their environmental, social and government (ESG) practices. However, how to overcome the sparsity of ESG data from non-reporting firms, and how to identify the relevant information in the annual reports of this large universe? Here, we construct a vast heterogeneous information network that covers the necessary information surrounding each firm, which is assembled using seven professionally curated datasets and two open datasets, resulting in about 50 million nodes and 400 million edges in total. Exploiting this heterogeneous information network, we propose a model that can learn from past adverse media coverage patterns and predict the occurrence of future adverse media coverage events on the whole universe of firms. Our approach is tested using the adverse media coverage data of more than 35,000 firms worldwide from January 2012 to May 2018. Comparing with state-of-the-art methods with and without the network, we show that the predictive accuracy is substantially improved when using the heterogeneous information network. This work suggests new ways to consolidate the diffuse information contained in big data in order to monitor dominant institutions on a global scale for more socially responsible investment, better risk management, and the surveillance of powerful institutions.Publisher\u27s another name: JSPS Grants-in-Aid for Scientific Research (S) Central Bank Communication Desig
    corecore